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We construct an efficient autonomous quantum-circuit design algorithm for creating efficient quan- 
tum circuits to simulate Hamiltonian many-body quantum dynamics for arbitrary input states. The 
resultant quantum circuits have optimal space complexity and employ a sequence of gates that is 
close to optimal with respect to time complexity. We also devise an algorithm that exploits commu- 
tativity to optimize the circuits for parallel execution. As examples, we show how our autonomous 
algorithm constructs circuits for simulating the dynamics of Kitaev's honeycomb model and the 
C^l ■ Bardeen-Cooper-Schrieffer model of superconductivity. 
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I. INTRODUCTION 

Feynman proposed quantum computing as a means to "imitate" quantum dynamics in order to overcome apparent 
intractability of universal quantum simulation on classical computers He conjectured that a universal quantum 
simulator (UQS) could efficiently simulate quantum evolution. Lloyd formalized Feynman's concept by employing a 
i— i- Trotter ordered-operator expansion to convert continuous time evolution into a quantum circuit C comprising unitary 
quantum gates Q. The UQS is now also referred to as "digital quantum simulation", both theoretically an d 
^4 experimentally Q. The adjective "digital" is used to contrast with the term "analogue quantum simulation", which 
aims to emulate evolution of a Hamiltonian H in a custom-designed experiment f^-tllj . The importance and near- 
future feasibility of the (digital) UQS, albeit without quantum error correction, drives experimental efforts to create 
^ \ these simulators for restricted types of H Q ■ 
O" 1 We present the first autonomous algorithm to design circuits for simulating the evolution generated by a general 
7i.-qubit fc-local Hamiltonian H^ n > within a pre-specified tolerance e. An n-qubit fc-local H^ n > is defined to be a linear 
combination of m Hamiltonians t)j , each acting on n qubits as an identity operator 1 on all but fc G polylog(n) 
qubits and polylog(n) is a polynomial function of log n. Our tolerance e is the worst-case 2-norm distance between 
the true evolved state under the specified evolution and the simulated output state, maximized over all allowed input 
states. 

In our analysis, we consider two independent cases of universal gate sets. One case corresponds to a finite set 
comprising a single two-qubit entangling gate plus a finite number of one-qubit gates. The second case incorporates 
a single two-qubit entangling gate plus both discrete and continuously-parameterized single-qubit gates. Strictly 
speaking only a finite gate set should be permitted for quantum error correction and scalability, but there is a trend in 
experimental studies to report quantum simulation with continuously-parametrized single-qubit gates. We want our 
algorithm to be relevant both the the strict case of a finite gate set and to experimental efforts that employ continuous 
tunability. 

Our resultant circuits are not only efficient (meaning that the circuit size scales polynomially with the number of 
' simulated qubits for fixed fc) but also uses the smallest number of qubits possible given the size of the system being 
simulated. Additionally, the circuit size also scales near-optimally with the run-time t of the simulation. 

This minimization over space and time costs (number of qubits required in the simulator and number of gates) is 
important for making quantum simulators as close as possible to practical implementation. Each additional qubit 
and each additional gate can be challenging to implement in practice so reducing these costs is not only important 
for proving that the scaling is efficient hence possible in principle but also to reduce the costs to make the simulation 
feasible with small simulators in the near future. Our minimum run-time algorithm is also improved by parallelizing 
gates by grouping commuting terms in the Trotter decomposition of the evolutionary operator, thus enhancing the 
near-term feasibility of the quantum simulator. 

Our work thus enables feasibility of UQS circuits. Experimental UQS circuits will be valuable to predict resultant 
states under Jj-evolution or to provide UQS-generated states as inp uts to quantum algorithms for purposes such 
as acquiring spectral properties or eigenstates of Hamiltonians [l3( or to determine particle scattering, e.g. in a 
relativistic quantum field theory p^Tl5|. T he g round state is regarded as the most important eigenstate as it 
uniquely determines all properties of the system [1 6I | and could be used to solve outstanding condensed-matter problems 
such as determining the energy gap for general Bardeen-Cooper-Schrieffer (BCS) superconductivity models of finite 
systems [l7j . Dynamical simulation algorithms have been devised to simulate ground-state properties via adiabatic 
state preparation (f7j or via dissipative interaction with the environment [H, HI • 
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II. ALGORITHM FOR DESIGNING THE QUANTUM SIMULATOR CIRCUIT 

A. Concept 

Our algorithm yields a string representing a quantum circuit that comprises a sequence of quantum gates to simulate 
fc-local Hamiltonian evolution within fixed 2-norm distance e. The n-qubit fc-local Hamiltonian is expressed in the 
Pauli operator basis as 
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The total number of non-identity Pauli operators in each tyy' is at most an n- independent constant fc. 

Our algorithm is designed to produce a poly(n)-size description of a quantum circuit that implements a unitary 
U^{t) such that 



exp 



< e, 



(3) 



where || • || denotes the 2-norm. This condition implies that the trace distance between the ideal evolution and the 
simulated evolution is at most e for any initial state (l8l. [l9j. Below we discuss the algorithmic input, processing and 
output. 



B. Input 

The algorithm requires the following inputs: 
n: number of qubits in the system; 
k: locality parameter of the Hamiltonian; 
t: evolution time for the simulation; 

e: worst-case 2-norm error tolerance (distance) between the true evolved state and the simulated state; 

ru; specifies which single-qubit gate set to use, namely {H, T = Z^ 1 / 4 } or the continuously parametrized set 
{H,R Z (6)}, withZ = R z (n/2); 



#(*•) 



bit-string representation of the n-qubit fc-local Hamiltonian. 



The Hamiltonian 



#(n) 



is entered into the algorithm as a bit string no larger than a poly(n)-size representation for 



this input to be efficient. Our algorithm accepts the Hamiltonian H^ n > input as the bit-string representation 



{(a,j,lj,Sj);j = l,...,m}. 



with 



lj - (IxjJrjJzj) 

" (n) 

the vector corresponding to the numbers of each type of Pauli operators in f)j and 

Sj = {Sxj,S Y j,S Z j) 



(4) 
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with Sxj, Syj and Szj the strings corresponding to the positions of each of the X, Y and Z operators respectively 

£(«) 

In Eq. (jj), substrings representing Hamiltonians f)^- appear as triplets of strings. These triplets are delimited 

by parentheses thus ensuring easy parsing of the overall string for H( n ' . Whereas a general matrix representation 

of H ^ is exponentially large in n, our representation of this k- local Hamiltonian has poly(n) size. 

As an example of this encoding, consider the three-qubit Hamiltonian 

jj(n=3) = X®I®l + 2y®y®l + 4F®l®Z (7) 
with m = 3. This Hamiltonian is representated as 
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and all other strings are empty. 

Our algorithm can compute the number of time steps r of equal duration - so that the evolution from to t is 
broken down into a sequence of finite steps. Furthermore our algorithm employs the Trotter-Suzuki (TS) ordered- 
exponential decomposition fl8U2~i| of order \ 1° simulate the evolution sequentially over each of the r time steps. 
However, our algorithm determines r and x based on optimizing an upper bound on tolerance, and superior choices 
of r and \ are possible but not by using our algorithm or any other known algorithm. 

As the user of our algorithm might wish to employ or test other choices of r and Xi we design the algorithm so 
that the user can override our choices of r and x- H our algorithmically determined values of r and x ar e overridden, 
the guarantee that the quantum simulations yields a resultant state within tolerance e no longer holds. Therefore, 
overriding the algorithm's r and x comes with the warning that the error in the quantum simulation is unknown. 
Specifically setting r = and x = causes our algorithm to determine optimal r and x values based on minimizing 
the number of gates required to guarantee that the simulated state has 2-norm error less than e for any input state. 
Positive values of r and x override our algorithmic determination of these parameters and use the input values instead. 

The purpose of the input variable zu is to enable circuit design that strictly uses a finite gate set in accordance with 
principles of quantum error correction and scalability |22j or to include a continuously-parametrized single-qubit gate, 
namely rotations by 9 around the z-axis using unitary operator R z (9), in accordance with prevalent experimental 
practice 0. More generally one could consider the case that vo is any desired universal gate set, but here we restrict 
our attention to just two single-qubit gates, either {H, T = Z^ 1 / 4 } or {H, R z (9)}, so w is a binary, or logical, variable. 
In our algorithm, both of these sets are accompanied by two-qubit controlled-not gate CNOT,^ with the i labelling 
the control qubit and j labelling the target qubit. 



C. Output 

Our algorithm yields an output string [C] that represents the resultant quantum circuit. This quantum circuit is a 
sequence of quantum gates that simulate the dynamics of the quantum system for an arbitrary input state. In our 
algorithm the basic quantum gates are members either of the finite-size universal instruction set {H, T, CNOT} or 
the continuously-parametrized set {H, R z {0), CNOT}. 

The single qubit gates are represented in the output as a string of the form u Hx" or "Tx" with x a bit string 
labeling the qubit acted upon by gate T or H, respectively. These gates, along with the identity 1, which is not 
explicitly stated in the circuit, can be parallelized and concatenated to form circuits. For example, the string H1T2 
represents a circuit that first performs the Hadamard operation on qubit 1 then a 7r/8-gate on qubit 2. The CNOT 
operation is represented similarly with the string "CNOTx,y" representing the quantum operation CNOT^. For the 
case of the gate set that includes continuously parameterized single-qubit z-rotation gates, R z (9), in the output, The 
action of R z (9) on qubit x is represented as the string U RZ[6], x" where [9] is a string representing the rotation angle 
9. 



D. Processing 

Our circuit-design algorithm proceeds through three stages. 
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Hamiltonian sorting algorithm: mutually commuting terms in H arc grouped together resulting in parallelized 
quantum simulation that reduces total runtime. 

Trotter-Suzuki algorithm: U^ n '(t) := exp{—iH^ n H} is decomposed into a sequence of exponentials of Pauli 
operators. 

Circuit design algorithm for Pauli-cxponcntials: determine the quantum circuit and convert into output 
string [C]. 

These algorithmic stages are described in the following sections. 



E. Summary 



An algorithm consists of input and output with the output is obtained by processing the input. We have been 
careful to discuss the input and output as bit strings as our algorithm is classical and runs on a classical computer. 
The output of the algorithm is a design procedure to construct a quantum simulator circuit that would simulate the 
evolution of a quantum state. In the following sections we describe the algorithmic stages. 



III. TROTTER-SUZUKI FORMULAS 



In this section we discuss the second stage of algorithm, which concerns decomposing the TS ordered-operator 
exponential into a sequence of exponentials of tensor products of Pauli operators. The second stage of the algorithm 
is discussed first because the first stage groups these TS terms so understanding the second stage helps to understand 
the terms being grouped in the first stage. 

We use the TS method to determine a product of exponentials of that approximates U^ n \t) within 2-norm 



distance e [18H20I ] . The total time of evolution t is divided into r time intervals each of equal duration 



At 



t 



(9) 



For Ux the % th -order TS iterate approximating the n-qubit unitary evolution U^ n \ the distance between the 'true' 
evolution operator U^ n \t) and the TS approximated evolution operator is 

ft 2x+1 



U {n) (t)-U^ n) {At) r 
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(10) 



The iterative TS formula for generating U x is well known [201 ] . The formulae are widely used in quantum simulation 
algorithms because they generate an approximation 



U^ n) (At) = cxp j-mj^fi} exp [-ia h ^ ) t 2 j ■ ■ ■ exp \-ia ju ^tjvf} 



(11) 



which is a product of unitary evolutions, for Hamiltonians Fj^™ represented by the sequence ji, and a sequence of times 
U. The TS formulae comprises a sequence of exponentials of Pauli operators that have simple circuits for quantum 
computer implementation [22j . 

Specifically, the approximation f/x (£) i s constructed iteratively for the Hamiltonian ffW = Y^JLi a jhi^ y i a 



(At) = ft cxp \-ia s hf^] fl oxp {-ia^f} 
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with 
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s p - 4_4i/(2 P -i) 
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Algorithm 1 Trotter-Suzuki Algorithm. 

Input: 

[iS"'™']: bit-string representation of the Hamiltonian 
At: time duration 

X- iteration order of Suzuki's method [20l ] 
Output: 

Suzlnt: array of exponentials. 

function TrotterSuzukiQH'™'], At, x) 

return Suzlnt <— sequence of exponentials [[/£ n ^(At)J (|14p using Suzuki's procedure [20l |. 
end function 



and the integer p obeys 1 < p < x- We emphasize this form of the TS formula because it is important for our 
grouping algorithm that the order of the exponentials in the product formula matches the order of the exponentials 
in the Hamiltonian. 

We express this TS stage of the algorithm as an outline of a computer program. The program's input is the bit- 
string representation [£f( n )] of the the n-qubit Hamiltonian, the desired order of the Suzuki iteration the evolution 
time t and the number of intervals r, which together yield the time step At ([5]). The program's output bit-string 
representation for the x th -order approximation to the true evolution: 



J7<» 



(At) _ (ctj M , lj M , Sj M , tM){(lj M -i , Ijm-i ' ^jV-ii *Af— l) ' ' ' i a ji > Iji j Sji ? tl)- 



(14) 



with M = 2m5 x 1 following from the recursive form of the TS formula; (|12p [20[ . The representation in (fT4)) stores 

(n) 

each exponential in the approximation as a sequence of strings that represent a Hamiltonian term aj fj^ and then the 
duration of the evolution step (stored to finite precision) . Our representation stores the exponentials in order of their 
execution; although, in this case, the symmetry of the TS formula? implies that we could also store them in reverse 
order without changing the result. The TS algorithm that generates a TS approximation of the form (fT4)) is given by 
Algorithm [1] 

The performance of the resulting simulation depends strongly on the chosen values for r and x- If r — or x = 0, 
our program determines suitable values of r or x; otherwise this program uses the user-supplied values. Given a 
specified value of x, then 



(2m(5/3) x - 1 X 
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\t)) 



1 + 1/2* 
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is optimal [23|, [24| , which guarantees that |2S 



< - 

2 - 2 



provided that 



U {n) (t) - (U^(At) 
e < 2mx(5/3) x_1 max jo, |t 



(15) 



(16) 



(17) 



We employ the value of r in (fl"5j) as the default value of r for the algorithm and take our default value of x to be 



lo g25/3( max i N*A) 



(18) 



because it causes the number of operations in the simulation to scale nearly linearly with t [23| . 

The value of r given in (|15|) can be larger than necessary for certain Hamiltonians (as shown in Appendix [A| . If a 
guarantee that the error is less than the tolerance e is not required, then choosing r smaller than the optimal value 
given in (|15[) could suffice and thereby reduce runtime. 



IV. HAMILTONIAN SORTING ALGORITHM 



Now we return to the first stage of the algorithm, which aims to reduce runtime by grouping TS terms based on 
generation by commuting Hamiltonians. In other words, TS terms are grouped together to parallelize the quantum 
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Algorithm 2 Hamiltonian sorting algorithm. 

Input: 

[H^ n ']\ bit-string representation of the Hamiltonian 
n: number of qubits 

m: number of Hamiltonian terms summed to make [H^] 
Output: 

[^sorted]- sor ted representation of the Hamiltonian with mutually commuting terms combined in groups 

function SORTH([# (n) ], n, m) 
m s <— 1. 

Gi <r- (ai.ii, 5i). 
for j From 2 to m do 

isAssigned <— 0. 

for p from 1 to m a do 

if isAssigned = then > Checks if term commutes with terms in G p 

isAssigned <— 1 if J2 V jt w \SviCi S wj | for each (a*, U, Si) in G p . 
if isAssigned = 1 then 

G p <— Concatenation of G p with (dj,lj , Sj). > Assigns term to G p . 

end if 
end if 
end for 

if isAssigned = then > Checks if new "group" of commuting Hamiltonians is needed 

m s <— m s + 1. 

Gm s <— (<ij,lj, Sj). 
end if 
end for 

return [-ff SO rted] <— Concatenation of Gi, . . . , G ms ■ 
end function 



simulation circuit. The benefits of grouping terms and exploiting parallelism is most notable in the case of physically 
local Hamiltonians, where we show that parallelism typically leads to a near-quadratic improvement to the scaling of 
the time required to simulate the quantum system's evolution. 

The algorithm achieves this grouping by decomposing the Hamiltonian ([1]) into fh groups of terms as 

rh 

^- ) = Efli ,) »fii ,) = E# ) , (19) 



such that all fjj £ Gj mutually commute 

The Trotter-Suzuki algorithm can then 
of the terms in each commuting set that simulates 



The Trotter-Suzuki algorithm can then be used to determine a sequence of exponentials of , namely the sum 



cxp 



(-i£ (n) t)=exp (-if>?°t) (20) 



j'=i 



within error e. Product-formula approximations are not needed to decompose the exponential of each group into a 
product of exponentials of Pauli -operations as each fr in any given group mutually commutes. In other words 



exp 



(-ith) = cxp -i J = n exp (^ l) t) (21) 



for any value of t. 

Finally, we note that explicitly grouping the terms in into is unnecessary. Instead it suffices to sort the 
terms in the Hamiltonian by group membership (i.e., terms that are assigned to group G\ appear first in [i?( n '], then 
terms in G2 appear next and so forth). If we use (|12p to construct the TS formula;, then the resulting simulation will 
sort the steps in the simulation into commuting groups of operations. As commuting operations can often be executed 
in parallel, this procedure reduces the time required to execute the circuit in systems that can use parallelism. 
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In the first stage of the algorithm our program accepts the string Si, as introduced in Sec. HH to determine if 
two terms in commute. The Hamiltonians and i)j commute if and only if, for the dummy variables 

v,we{X,Y,Z}, 

\S VI n S wj \=0 (mod 2), (22) 

where | • | denotes the size of a set •. 

As a clarifying example, consider the Hamiltonian ([7]). The first two terms in ([7]) commute because of the anti- 
commutativity of Pauli-opcrators and because both terms have differing actions on an even number of qubits. Crite- 
rion (|2"2")l also tells us that they commute because 

Y,\SvinS w2 \ =\S xl nS y2 \ =2=0 (mod 2). (23) 

On the other hand, the second and third terms do not commute as 

J2 \Sv2 n S w3 \ = \S y2 n S z3 \ =1=1 (mod 2). (24) 

Criterion (j2"2")l accounts for the commutativity of Hamiltonians that act on disjoint sets of qubits as well as other 



commuting Hamiltonians such as the star and plaquette operators in the toric code [251 ] . A proof of the general 
validity of (1221) as a criterion for commutativity is given in Appendix |Bj If desired, a more restrictive grouping 
condition 



|S„ 4 n 5^1 = (25) 

can be used only to group together operations that have actions on disjoint sets of qubits. 

The criterion for commutativity in (f2"2"j) can be used to find an efficient classical algorithm for grouping {(jj" '} into 
groups of mutually commuting Hamiltonians, which we describe below formally. This grouping algorithm is efficient 



because m is polynomially large in n for local-Hamiltonians and (|22[) can be efficiently evaluated. 

The depth of the resulting quantum circuit depends on the value of m for the Hamiltonian. The reductions in 
the depth vary with the number of groups required for the Hamiltonian in question. The question can, however, be 
addressed for the cases of generic fc-local or physically fc-local Hamiltonians (by generic we mean fc-local Hamiltonians 
that include every possible p-body interaction for p < fc.). 

We estimate the worst-case scaling of m for generic fc-local by using the more restrictive grouping condition 
that fy\ and t)j are assigned to the same group only if condition (|2~5j) is satisfied. This requirement will generically 
result in a smaller value of m than our grouping algorithm will yield. In this case, at most \n/k\ fc-body terms can 
be assigned to each group for fc-local interactions. Terms with (fc — l)-body interactions (or fewer) can be neglected 
because n is assumed to be large and the vast majority of terms in the fc-local Hamiltonian are fc-body. Specifically, 
there are 0(n k ~ 1 ) terms with only (fc — l)-body interactions or fewer and 0(n k ) terms with fc-body interactions, 
which means that we can neglect terms that are only (fc — l)-local for generic Hamiltonians in the limit of large n. 

The number of fc-body Hamiltonians that can be assigned to each group before the Hamiltonians violate the 
grouping criterion (|22[) that scales as 0(n/fc), with the Bachmann-Landau notation for indicating that a function 
of this order is asymptotically bounded above and below by n/k up to multiplicative constants. For fc a constant 

Physically local Hamiltonians are constrained such that each qubit interacts with at most a constant number of qubits, 
which means that there are 0(n) fc-body terms present in physically fc-local Hamiltonians. Therefore, the number of 
groups required for physically-local Hamiltonians scales as 

m e O ( \ ) = 0(1), (27) 



n/k 



which we will see constitutes a nearly-quadratic reduction in the circuit depth for some simulations. 
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FIG. 1: Quantum circuit for implementing exp(— icf)X <g> Y ® 1 <g) Z) with _H" the Hadamard gate, T £ a concatenation of I 7r/8 
gates, and Rz(2(j}) a qubit rotation of tp about Z. 



V. IMPLEMENTING TROTTER-SUZUKI FORMULAS 



The main primitive element of the circuit-design algorithm is a basic circuit element Cg corresponding to a par- 
ticular sequence of gates in our gate set to simulate evolution due to each exp{— iajfy^ti} in the output of the 
subroutine described in Sec. IIIII The circuit construction that we use is an optimized version of that presented in {22j . 
Underpinning this primitive Ci is the conversion of Pauli gate operations to operations in our gate set • 

The simplest way to explain this step is by example: 

C/ £ ( ™ =4) (0) = exp {-i(j)X ®Y®t®Z) (28) 

shown in Fig.[TJ This circuit allows for a continuously parametrized phase-rotation gate R z (2(f>) = exp(— i(j)Z), but of 
course a proper quantum algorithm would work with a finite gate set. The gate Rz(2<f) can be reduced to a finite gate 
set using the constructive version of the Solovay-Kitacv algorithm [2(|. The gate-set variable w 6 {0, 1} determines 
whether the continuously-parametrized gate R z (2cf>) is permitted as a stand-alone gate or whether the Solovay-Kitaev 
algorithm needs to be implemented to construct a circuit from a finite set of gates. 

Our algorithm permits gates two be taken from two sets: one discrete and the other continuous. We label the gate 
sets 

m = Q: {H,T,CNOT}, 

w = l: {R, R z {9), CNOT: 6 e [0,2tt)}. 

The gate T is implicitly present in gate set xu = 1 because R z (ir/8) gives the 7r/8 gate up to a global phase. Our 
algorithm takes a logical variable as an input that specifies the gate set from which the gates in the output should 
be drawn and uses the Dawson-Nielsen algorithm [26j to convert the continuous rotation gates into discrete gates if 
gate set is chosen. 

The circuit primitive for Ce, as illustrated in Fig.Q] is constructed by first choosing one of the system qubits to be 

"(n ) " in) 

the 'parity qubit' whose role is to track the parity of qubits affected by f)^ when expressed in the eigenbasis of f)^ . 

This parity is required to perform U^ n \aj e t£) using diagonalization 04^- The parity qubit is always chosen to be a 
qubit that is non-trivially affected by the Hamiltonian, where we say that a qubit is trivially affected by a Hamiltonian 
if it acts as the identity on that qubit. This choice reduces the cost of computing the parity by two CNOT gates. 
We always choose the parity qubit to be the qubit with the largest label amongst this set. As an example, the fourth 
(bottom) qubit is chosen to be the parity qubit for the Hamiltonian in Fig. [TJ 

The method we use to construct the simulation circuit is given in Algorithm [3J Specifically this stage of the 
algorithm produces a bit-string representation for a circuit approximating exp{— ia,it)\ n ^ti} where Oif)!™'' is a term in 
H( n \ This routine employs a Solovay-Kitaev construction to go from a gate parametrized by a continuous angle 
to a gate constructed only from gates selected from the finite-size instruction set. Details of the Solovay-Kitaev 
construction are not discussed here as this algorithm has been provided by Dawson and Nielsen (26j . 

Algorithm [5] can be used to provide a string representation or each of the exponentials of Pauli-operators. By 
concatenating these strings together, our procedure constructs a bit-string representation of the overall simulation 
circuit for a single time step. The algorithm for this is provided below. 
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Algorithm 3 Circuit-design algorithm for Pauli-exponentials. 

Input: 

(di, li, Si,ti): bit-string representation of the exponential 
vj: determines which gate set should be used 
S: error-tolerance for the Solovay-Kitaev Algorithm 
Output: 

[Cj]: simulation circuit for exp(— if)\ U) 
function PCiRCUiT((ai, U, Si, U),w, 8) 

[d] <— 0. t> Sets [d] to the empty-string. 

for each string i in Si, x do 

[d] i— [d]H£. > Concatenates [d] with Hadamard on qubits that t)\ acts as X on. 

end for 

for each string £ in Si, y do 

[&] <- [Ci]T£T£T£T£T£T£T£T£H£. > Applies diagonalizing rotation to each qubit on which the term acts as Y. 

end for 

Cax <- max(£ £ Si). 

for each I in Si \ {i?max} do > Identifies parity qubit. 

[d] «- [Ci]CNOT£,^ max . 
end for 
if \vj=1 then 

SK <— output of the Solovay-Kitaev Algorithm for R z (2a.iti) acting on qubit £ max with error tolerance S. 
[d] <- [&]SK. 
else 

[d] <- [C 1 ]Kl{2a i t i ),£ max - 
end if 

for each £ in Si \ {Cax} do 
[&] <- [&]CNOTl,l max . 
end for 

for each string £ in Si, y do 

[d] <- [Ci\H£T£TL 
end for 

for each string £ in Si lX do 

[d] *- [C*]H£. 
end for 
return [d]. 
end function 



VI. MAIN ALGORITHM 
A. Complete Procedure 

We now construct the circuit-design algorithm, which employs the programs described in the previous three sections. 

The efficiency of our main algorithm depends on whether r and 2to5 Xo_1 are polynomially large. It is straightforward 
to see by substitution that the default values used for both of these quantities scale polynomially with the simulation 
parameters, and hence is efficient. On the other hand, if the default values of r and x are n °t used then the above 
algorithm may not be efficient. 

B. Cost Estimates 

We now provide upper bounds for the scaling of the circuit-size of the circuits yielded by our design algorithm 
using the default values \o an d r$ , which are chosen to guarantee that the simulation time scales near-linear ly with t 
and the error is at most e/2 respectively. There are three costs that we consider: the number of gates in the resulting 
circuit, A^op, the time required to execute the circuit using parallelism, r, and the number of qubits required which in 
our case is trivially n. We assess the remaining costs A^ op and r below. 

The value of A^ op is bounded above by the number of circuit primitives, Cg, needed to simulate the evolution multi- 
plied by the maximum cost of implementing a circuit primitive. The cost for implementing each circuit primitive Cg 
for a fc-local H requires at most lOfc single-qubit gates, 2k — 2 CNOT gates and one Rz rotation. This worst-case 

~ (n) 

estimate is given by the cost of simulating a fr that acts as Y on k qubits because F-interactions are the most 
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Algorithm 4 Main Algorithm 



> r — guarantees error is at most e 
guarantees near-linear time scaling. 



Input: 

[H^\: bit-string representation of the Hamiltonian 

n: The number of qubits 

t: evolution time 

e: error tolerance 

r: number of time steps 

X- iteration order of TS formula l> x 

■uj: logical value that indicates whether the discrete or continuous gate setis used 
Output: 

[C]: simulation circuit for exp(— iH\ n t) 
function Main([J/ (n) ], n, t, r, x, e, 

[Ctcmp] <- 0- 

m <— number of terms in H (n \ 
[H (n) ] <r- SortH([tf (n) ], n,m). 
amax <— maxi |0j|. 
if x — then 



end if 
if r = then 



r <r- 

end if 



r 2(2m(5/3) x ~ 1 xttmaxt) 1 + 1/(2x) 
| ( E /2)!/(2x) 



if e > 2mx(5/3) x_1 maxj |ey|t then 

e <— 2mx(5/3) x_1 maxj |aj|£. 
end if 

Suzlnt <- TrotterSuzuki([J/ (n) ], ±, X )- 
for j = 1 to 2m5 x_1 do 

[Ctcmp] «- [Ctcmp] PCircuit (Suzlnt(j), 
end for 

[C] <— [Ctcmp]- 

for j = 1 to r — 1 do 

[C] <- [C] [Ctcmp]. 

end for 
return [C]. 
end function 



(4m5X _1 r) 



> See Alg. H 



t> Computes default value of x 



> Computes default value of r 



> See Alg. [T] 
> Finds circuit for one time step. 

> See Alg. 



> Finds complete simulation circuit. 



expensive to simulate using our algorithm for finding Ce- The Solovay-Kitaev algorithm of Dawson and Nielsen [2£ 
gives the cost of implementing the continuous gate Rz{2(j>) within tolerance e/(4m5 Xo_1 r-o) as 



Nsk&0[ log 



A /4m5»- 1 (2™(5/3)» , - 1 xa ma xi) 1+1/(2x ° ; 
V 2e(e/2) 1 /(2xo) 



As 



Xo 



lo g25/3( ma maxi/e) 



(29) 



(30) 



have that log(5 Xo ) e 0(y/\og(ma max t/e)). Then using the properties of logarithms, we find that 

iVsK € O | log 



(31) 



The total number of operations used to implement each primitive circuit Ci within tolerance e is thus 0(k + Nsk)- 
As C = Cm ■ ■ -Ci, the total number of operations in C scales as 0(M(k + Nsk))- Finally, C/W(t) is simulated by C r 
so the total number of operations scales as 



N op eO(2m5 Xa - 1 r (k + N SK )) 



(32) 
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We then substitute the value of r$ into this expression to find that, 

M g (k + N SK )m 2 +°W (max, |a^) 1+o(1) 

Iy °P fc £ o(l) ' ^ 

with e°^\ m ^, and (max, It) ^ 1 ^ representing quantities that scale sub-polynomially but not quite poly- 
logarithmically. As Ask varies sub-polynomially with respect to all parameters and k is a constant, Ask + k can be 
incorporated into (jnmax.; \ai\t / ef^ in Relation (|33l) . This leads to the conclusion that 

AT (fc + A SK )m 2 +°W (max, |«^) 1+o(1) m 2 +°W (max, |q t |t) 1+o(1) 

The performance of our circuit-design algorithm is enhanced if a reasonable extra restriction is placed on the /c-local 
H. The upper bound m is different for fc-local vs physically fc-local H as we now see. For fc-local H^ n \ 

m<J2^( n ) eO(n% (35) 

q=l 

because there are at most 3 9 (") q-body terms in H^ n ' for q = 1, . . . , k. The scaling m € 0(n k ) arises from standard 

inequalities for binomial sums and, as A: is a constant, then so is 3 k . If is physically fc-local, m € 0(n) because 
each qubit interacts with at most a constant number of neighbors. 

We can then eliminate m from (|3"3")) by noting that, if H^ n ' is fc-local, then m € 0{n k ) for k a constant, and 

^ n*(W» (max,|q,|t) 1+ °W 
A op G ^ . (36) 

If is physically fc-local and k is constant, then m <G O(n). We substitute this scaling into ([33| to obtain 

^ (max^q.^) 1 ^ 1 ) 

Comparing to ([3T|) shows that the simulation cost is dramatically reduced for physically fc-local rather 

than just fc-local. This cost reduction does not occur from a modification of the algorithm, but rather a more careful 
costing of the performance of our algorithm. 

In fact the circuits generated by our algorithm are optimal, or near-optimal, in three distinct ways. First, they 
exhibit near-optimal scaling with t because linear scaling is known to be a lower bound for general quantum simula- 
tion [l8l . [l9l [23| . Second, they have optimal space complexity because a minimum of n-qubits of memory is needed 
to simulate the quantum dynamics of an n qubit system. Finally, the n-scaling of ( (|36[) ) and (|37p is unlikely to be 
surpassed by other general purpose TS-based simulation algorithms because the scaling with n is derived from the 
value of m for the Hamiltonian [IH, [l9| . 

The fact that better scaling with n cannot be obtained by using a superior decomposition method for the Hamil- 
tonian follows from Vizing's edge-coloring graph algorithm (28j . which states that a graph with maximum degree d 
cannot be colored using fewer than d colors. This implies that a q-sparse Hamiltonian can be decomposed into at best 
d one-sparse matrices. A fc-local Hamiltonian can be at most 0(n k ) sparse, which implies that 0{n k ) terms will be 
present in the Hamiltonian using the optimal decomposition method. This value of m coincides with the value that 
our algorithm finds for simulating fc-local Hamiltonians; hence our algorithm is unlikely to be significantly surpassed 
by other algorithms that use similar strategies. 

Now we will examine the scaling of the time required to implement the resultant quantum circuits on quantum 
computers that can exploit parallelism. Without grouping, the depth of the quantum circuits yielded by our algorithm 
scales with m is at worst m 2+ °( 1 \ Although our grouping step does not change the circuit size, it causes the depth 
of the resulting circuits to scale as fhm 1+ °( 1 \ where m may be smaller than m. 

The factor of m 1+o( - 1 - ) remaining in the scaling comes from the upper-bound used to estimate error in the Trotter- 
Suzuki formulas, which does not change if grouping is used. Thus, parallel execution of the exponents in each group 
can be used to reduce the scaling of the execution time of the quantum simulation, r, for fc-local Hamiltonians to 

W fc(2+ (D)-1 (maXi \ ai \ t ) 1+ °W 
TG ^(1) ' ( 38 ) 

and for the case of physically fc-local Hamiltonians it becomes 

n 1+ °W (maxi |ai|t) 1+o(1) , . 

^ e o(i) . ( 39 ) 

which scales nearly-quadratically better with n than what we would expect if grouping were not used. 
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FIG. 2: Kitaev's honeycomb lattice with each vertex representing a physical qubit and each edge representing an interaction 
between two qubits denoted by x-, y-, and 2-links. 

VII. EXAMPLES 

We now examine the performance of our circuit construction algorithm when applied to simulating the quantum 
dynamics of two important physical systems. Specifically, we examine simulating Kitaev's Honeycomb model and 
pairing models similar to the Bardeen-Cooper-Schreifer model of superconductivity. 

A. Simulating Kitaev's Honeycomb Model 

Now we consider two important examples that demonstrate the efficacy of our procedure. For the first example, 
consider Kitaev's honeycomb model [25{ described by the Hamiltonian 

H (n) = -J x J2 x i x o- J y J2 Y i Y i- J * Z ^ 

x— link y— link z— link 

with the links shown in the honeycomb-lattice representation depicted in Fig. [2] Although exactly solvable [25j . 
the ground state has applications for topological error correction and is difficult to experimentally prepare. Our 
simulation circuits can then be used (in conjunction with a ground-state preparation method such as adiabatic state 
preparation fl7j or the Abrams Lloyd algorithm [13]) to prepare the ground states of such Hamiltonians. 

The resulting sequence of exponentials yielded by our circuit design algorithm yields a sequence of exponentials 
of XiXj, YiYj and Z^Zj. Figure [3] gives the simulation circuits that our algorithm uses to simulate each of these 
exponentials. We can use these diagrams to find the number of operations used in the simulation by using the fact 
that each term in the Hamiltonian appears 2(5) x_1 times in the TS formula and that there are n/2 different XX, 
YY and ZZ interaction terms in the Hamiltonian. 

As r TS formulas are used in the simulation, the total number of times each of these three types of interactions 
appears is n5 x_1 r. The total number of gates required for the simulation can be found by multiplying the number of 
interactions of each type by the number of gates needed to simulate that type of interactions (explicit constructions 
for these circuits are given in Fig. [3]). The total number of operations required to simulate the Honeycomb model is 
summarized below. 

• Hadamard gates: 8n5 x ~V, 

• T gates: 16n5 x_1 r, 

• Z-Rotation gates: 3n5 x_1 r, 

• C-NOT gates: 6n5* -1 r, 

where r is the number of time steps used in the simulation and x is the iteration order of the Trotter-Suzuki formula 
used in the simulation. If r is chosen as per (TT5| then the error is promised to be less than e/2 given e is sufficiently 
small [l8l [l9j: however, other choices of r are possible if rigorous guarantees that the simulation error is less than e 
are not desired. 
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U T 2_ 



U T 2U 



(a) 



H 



R z {24>) 



H 



W — H 



(6) 



e- ^(20) 



(c 



FIG. 3: Circuits for simulating evolution due to exponentials of the terms present in the honeycomb model or pairing models: 
(a) simulates e~ lYiY ^ (b) simulates e ~ tXiX ^ and (c) simulates e - lZ i z j^. Exponentials of the form e~ lZp4> can be trivially 
simulated with an R z rotation. 



We can then directly estimate the scaling of the circuit size with the simulation parameters if x & n d r are taken to be 
the default values for our algorithm. Kitaev's honeycomb- model Hamiltonian is physically two- local with m < in/2. 
Combining the observation that 

max|ai| < max{\J x \ 7 \J y \, \J Z \} , (40) 

i 

with p7|) implies that our circuit-design algorithm yields C for simulating (t) within error tolerance e and with a 
circuit size that scales as 

N op G n 2 +°^ (max{| J x \, \J y \, \ J z \}t) 1+o{1) /e°«, (41) 

elementary gates from Q acting on only n qubits. This scaling is significantly better than previous algorithms, 
which had a bound on the number of gates in 0(n A log* n) and utilize quantum oracles that may be difficult to 
implement [H,[I3|. 



B. Simulating Pairing Models 

Our second example simulates general pairing Hamiltonian evolution, which is central to studies of superconductivity 
in many-body systems. A notable example of such Hamiltonians is the BCS Hamiltonian, which describes the 
interaction of electrons on a lattice according to [2!| l30j 

^ n n 

H B cs = 2 J2 £ p( a p a P + a - P a -p) + H ^ a p 6 -jA a -ii ( 42 ) 
p— i pj— i 

where a p is the fermionic annihilation operator for a fermion in states p = (p, t) arL( l — p = (—p,l) with p the 
particle's momentum and f and -J, its spin, N p is the number operator for fermions in state p, £ p is the on-site 
interaction strength, and V p i is the interaction strength between neighboring fermions. 

The general BCS Hamiltonian (|42p can be mapped to a spin system by using each spin to represent the presence or 



absence of a fermion in that mode. Wu, Byrd and Lidar [171 ] show that a large class of pairing models that subsume 
the BCS Hamiltonian can be expressed as 



n n 



2 

p=l r=± i>p=l 
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for X p , Y p and Z p the Pauli X, Y and Z operators applied to qubit p. Specifically, the BCS Hamiltonian has V pl = 
and 7 P = £ p + V pp . 

As in the previous example, our circuit design algorithm yields a sequence of exponentials of the different terms in 
the Hamiltonian. The circuit implementations of the resulting exponentials can also be seen in Fig. [3] By multiplying 
the total number of exponentials of each type by the number of gates used in the implementation of each exponential 
and collecting the result, we find that our algorithm yields a circuit containing the following numbers of gates: 

• Hadamard gates: 16n(n — 1)5 X r, 

• T gates: 32n(n - l)5* -1 r, 

• Z- Rotation gates: 2n(2n — l)5 x_1 r, 

• C-NOT gates: 8n(n - l)5 x " 1 r, 

where the algorithms cited scaling is found by choosing r as per Eq. (|15[) and substituting the asymptotic scaling of 
r for x chosen approximately optimally (l8l . Il9| . The value of x that reduces the total number of gates can be found 
by minimizing the sum of these gate counts over all X- 

Previously, circuit design yielded circuits with 0(n 5 t 2 ) gates with no promises about the accuracy of the simula- 
tion [l7j . H p is two- local and hence m € 0{n 2 ), which implies that our algorithm yields a circuit with complexity 

n 4 +°« (max p ,, r {| 7p |,|^|}i) 1+ ° (1) 
e * -® 1 , (43) 

when the default values of r and x are used. 

Our algorithm can thus generate efficient quantum circuits for simulating the dynamics of general BCS Hamiltonians. 
Our resultant circuits can be used in conjunction with eigenvector simulation techniques or adiabatic state preparation 
to approximate the ground state of a quantum system and can thus be useful for autonomously determining whether 
classes of pairing models afford exotic types of superconductivity. 



VIII. CONCLUSION 



In conclusion, we design an efficient classical algorithm for autonomous construction of efficient quantum circuits 
to simulate state evolution. The circuits are costed in terms of a small standard gate set and requires neither a 
Hamiltonian oracle nor ancillary qubits, thereby making the universal quantum simulator minimal in space cost. The 
algorithm also systematically searches out commuting terms in the Hamiltonian and groups them to reduce the depths 
of the circuits yielded by our algorithm; thereby reducing the time required to execute the circuits using parallelism. 

Our circuit construction algorithm is a significant advance because it is straightforward to implement the algorithm 
on a computer, it is highly efficient and it gives upper bounds for the simulation error. Knowing error bounds is 
important for assessing the veracity of any quantum simulation. Our work thus provides an important step towards 
constructing a practical, efficient and trustworthy simulator of quantum dynamics. 

There are several remaining open problems that have not been addressed by this work. First is the issue of simulating 
time-dependent quantum systems. Such issues can be resolved by employing Trotter-Suzuki formulae for ordered 
operator exponentials (23l . l3l| although the optimality of those error bounds for actual circuits remains to be checked 
using algorithmic methods we have introduced here. 

Similarly, providing better upper bounds for r remains an important problem as our work has shown that in 
physically significant cases these bounds can be far too pessimistic. Providing such bounds would enable much more 
challenging simulation experiments to be performed in cases where rigorous error bounds are required of the output 
states. 

A final important extension of this work is discussing the optimization of the resulting circuits that are yielded by 
the algorithm. Further optimization should be possible by using circuit identities to simplify the resulting circuits. 
Finding autonomous methods to optimize the output of such algorithms would be a significant asset in the development 
of a DQS that exceeds the power of existing classical simulators of quantum systems. 



Appendix A: Numerical Estimates of Error in Low— Order Trotter— Suzuki Formula? for 2-Local Hamiltonians 



Decoherence and gate error present substantial obstacles to constructing a UQS in current experimental implemen- 
tations. This means that it is important to ensure that the number of gate operations needed is minimal for the error 



15 



so 



10 



10 



10 



10 



10 - 

io 5 1 



~ H n=4 (upper bound) 



- 1 ■ 



B- 



ST 



10 

-9 

10 



10 



0.0001 



0.001 



0.01 



0.1 



t/r 



FIG. 4: This plot shows the mean error or mean upper bound for the error incurred when using the lowest-order Trotter- 
Suzuki formula on 50 randomly generated two-body Hamiltonians acting on 4 qubits that were sampled from our Gaussian 
ensemble of Hamiltonians. This result shows that the lowest-order Trotter-Suzuki formula given in (|A1[) can be incorrect by 
as much as six orders of magnitude when applied to simulations of 2-local Hamiltonians. 



tolerances required. Previous work has estimated the number of exponentials needed to rigorously guarantee that the 
error in the Trotter-Suzuki expression at most saturates the required error tolerance; however, we show here that the 
error bounds used to derive the default value of r are too loose by many orders of magnitude for a large set of realistic 
many-body Hamiltonians. 

This separation is illustrated in Fig.2]by comparing an upper bound on the error in a single application of the Strang 
splitting formula to the computed error for Hamiltonians chosen randomly from an ensemble of 2-local Hamiltonians 
that have their aj independently distributed according to a Gaussain with mean zero and variance one. As the upper 
bound in question is used to estimate the number of time steps, r, needed in the simulation we conclude that upper 
bounds for r cited in [l8|, [l9|, [23| are too loose for some practical cases. 

We estimate the error invoked by using the two lowest-order Trotter-Suzuki formulas by numerically sampling 
random 2-body Hamiltonians taken from our Gaussian ensemble. We then measure the error in the formula by the 
2-norm of the difference between the formula and the correct exponential for the randomly generated Hamiltonian. 
We then repeat this for fifty different randomly generated Hamiltonians for evolution of duration t= 10~ 4 to t = 10 _1 . 
We repeated these numerical experiments on 2 through 8 qubits. Higher-order integrators can be studied similarly, 
but low-order integrators are often the most significant for the current generation of experiments (32[. 

Using the error bound for the Strang-splitting (denoted u[ n ^) in [23| we find that 



(n) 



exp 



r 



< 2 



/ 3mmaxj \cii\t 



2r 



(Al) 



We find by numerical integration that the bound is, on average, too loose by approximately six orders of magnitude 
for t £ [10~ 4 , 10 _1 ] for two-local Hamiltonians. Wc find by fitting the resulting data that the average error is well 
modeled for randomly sampled 2-local Hamiltonians acting on 2-8 qubits by 



2(||ffW||i 



m 



(A2) 



for u[ n ^ the lowest order Trotter-Suzuki formula. Most of the randomly selected Hamiltonians yield accuracies within 
an order of magnitude of the above expression. 



16 



s. 
-t-i 

e 



10 



10 



10 



10 



10 



10 



10 



~« — n=2 

~ B n=2 (approximation) 
n=4 

- :* - - n=4 (approximation) 
■ + ■ - n=8 

* n=8 (approximation) 



0.0001 



0.001 



0.01 



0.1 



FIG. 5: This plot shows the mean error in the lowest -order TS formula as a function of the duration of each time step and 
the approximation (|A2[l where each point is randomly drawn from a Gaussian ensemble of 2-body Hamiltonians. Error bars 
are computed via the standard deviations of the measured results for the ensembles and only the upper bar is plotted because 
the standard deviation is comparable or exceeds the mean value for all of these data points. The data are therefore within 
statistical error of our approximation. 



Similarly, the second lowest order Trotter-Suzuki formula was also bcnchmarked using random local Hamiltonians. 
We find from the data that for random 2-local Hamiltonians that 



U. 



(n) 



8.5 ( \\hW\ 



,1.5 



(A3) 



We can also find approximate forms for the error scaling for higher-order integrators, although doing so becomes more 
difficult as numerical precision restricts the range of t that can be used to assess the scaling. Additionally, we focus 
on these two integration formula as they are often the two most practical to use in non-fault-tolerant simulations 
where gate error limits the utility of high-order product formulas [321 ] . 

The total number of time steps needed to approximately simulate the Hamiltonian evolution within Trotter-error 



e/2 using [/■[ or follow directly from expressions (|A2|) and (|A3J) . As each operator in the product formula is 
unitary the simulation errors are at most additive throughout the evolution. This implies that for the case where ll[ 
is used we want 



'2(||ff(»)||i) 3 



e 

< -. 
- 2 



(A4) 



Solving for r gives the following approximate requirement for the number of time steps needed to satisfy the required 
tolerance 
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FIG. 6: This plot shows the mean error in the second-lowest-order TS formula as a function of the duration of each time step 
and the approximation ()A3|) for a set of 50 randomly generated 2-local Hamiltonians sampled from our Gaussian ensemble for 
each data point. We observe that the data is within statistical error of the fit in ()A3|) . 



The corresponding approximate requirement on r for i s 
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In general, these bounds suggest that l/[ provides a shorter sequence of exponentials than the sequence consisting 



of r U2 formulae if 
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(A7) 



which is guaranteed if 



e > 



64 




t 


] 





(A8) 



If inequality (| A8|) is not satisfied, then higher-order formulae such as are likely to yield more efficient simulation 
circuits. Present simulation experiments, however, are often confined to short evolutions with relatively large error 
tolerances (because direct comparison with numerical results is possible). As a result, low-order approximations 
remain relevant for present experiments. 

As a final note, we remark that these estimates of r depend on the spectral norm of the Hamiltonian, which in 
general can be very difficult to approximately compute if the corresponding matrix has non-positive entries. In such 
circumstances, it may be preferable to use the bound < mmax,-|a,| to estimate the required value of r. 

Although, these estimates for r may not be substantially tighter than (|15p. 
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Appendix B: Condition for commutation of Pauli operators 

Here we prove that (|22[) gives a necessary and sufficient criterion for determining whether two Hamiltonians that are 
tensor products of Pauli operators commute. This criterion is essential to our discussion of parallelizing the quantum 
simulation because we need to divide the simulation into mutually commuting sections in order to exploit parallelism. 

We simplify our discussion by removing from fjj n ' and f)j- n ' every qubit that is cither acted on as the identity 
operator by at least one of the Hamiltonians or every qubit that both Hamiltonians have the same action upon. These 
qubits are removed from consideration because they are not needed to determine the commutation properties. After 
removing these irrelevant qubits, we simplify the discussion by relabeling the remaining qubits to group them in six 
different groups. These simplifications lead to the following representation for f)'™' 



The simplified expression for f)^"' is 



(n) ^ y®|Sx<nS v f| z ®\S yi ^S,j\ ^ X ®\S^r\S^\ 

^ z ®\Sx,^S z] \ ^ jf®|S B< n^l (g Y®\ 8zi ft 8 rt\. (B2) 



We can then evaluate the product of the two operators using the property that XY = iZ, YZ = iX and ZX = iY 
and also using the anti-commutativity of Pauli-opcrators. This implies 



3 (_jy)®is„ns, 3 i ^ (_^)®is vi ns«,i (_ iX )8is, 4 ns,ii. ( B 3) 



Conversely, 



®(ir)®i s -ns 2J i ^ ^^is^ns^i ^^i^ns^i. ( B4 ) 

We then see from (JB3J) and (B4|) that [f^ n) , fj( n) ] = if and only if 



\S Vi r\ S xj \+\S xi n S zj \+\S zi n S xy \ = \ S xi n 5„ , | + \S zi n S XJ - 1 + 1 s y . n 5»„ | (mod 2) , 



(B5) 



which is equivalent to 
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